Approximate Policy Iteration: A Survey and Some New Methods

نویسنده

  • Dimitri P. Bertsekas
چکیده

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms, and aims to unify the available methods in the light of recent research developments, and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods such as LSTD, and iterative methods such as LSPE and TD(λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods, and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new iteration method for solving a class of Hammerstein type integral equations system

In this work, a new iterative method is proposed for obtaining the approximate solution of a class of Hammerstein type Integral Equations System. The main structure of this method is based on the Richardson iterative method for solving an algebraic linear system of equations. Some conditions for existence and unique solution of this type equations are imposed. Convergence analysis and error bou...

متن کامل

Some New Existence, Uniqueness and Convergence Results for Fractional Volterra-Fredholm Integro-Differential Equations

This paper demonstrates a study on some significant latest innovations in the approximated techniques to find the approximate solutions of Caputo fractional Volterra-Fredholm integro-differential equations. To this aim, the study uses the modified Adomian decomposition method (MADM) and the modified variational iteration method (MVIM). A wider applicability of these techniques are based on thei...

متن کامل

Some New Analytical Techniques for Duffing Oscillator with Very Strong Nonlinearity

The current paper focuses on some analytical techniques to solve the non-linear Duffing oscillator with large nonlinearity. Four different methods have been applied for solution of the equation of motion; the variational iteration method, He’s parameter expanding method, parameterized perturbation method, and the homotopy perturbation method. The results reveal that approxim...

متن کامل

Numerical solution of the system of Volterra integral equations of the first kind

This paper presents a comparison between variational iteration method (VIM) and modfied variational iteration method (MVIM) for approximate solution a system of Volterra integral equation of the first kind. We convert a system of Volterra integral equations to a system of Volterra integro-di®erential equations that use VIM and MVIM to approximate solution of this system and hence obtain an appr...

متن کامل

Approximate modified policy iteration and its application to the game of Tetris

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010